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Abstract 

We address the non-redundant random generation of k words of length n from a context- 
free language. Additionally, we want to avoid a predefined set of words. We study the limits 
of a rejection-based approach, whose time complexity is shown to grow exponentially in k 
in some cases. We propose an alternative recursive algorithm, whose careful implementa- 
tion allows for a non-redundant generation of k words of size n in 0{kn\ogn) arithmetic 
operations after the precomputation of 0(n) numbers. The overall complexity is therefore 
dominated by the generation of k words, and the non-redundancy conies at a negligible cost. 

1 Introduction 

The random generation of combinatorial objects has many direct applications in areas rang- 
ing from software engineering [I] to bioinformatics |12j . It can help formulate conjectures on 
the average-case complexity of algorithms, raises new fundamental mathematical questions, and 
directly benefits from new discoveries of its fundamental objects. These include, but are not 
limited to, generating functionology, arbitrary precision arithmetics and bijective combinatorics. 
Following the so-called recursive framework introduced by Wilf |15j . very elegant and general 
algorithms for the uniform random generation have been designed [S] and implemented. Many 
optimizations of this approach been developed, using specificities of certain classes of combina- 
torial structures [TU] , or floating-point arithmetics [5] . More recently a probabilistic approach to 
this problem, the so-called Boltzmann generation [5] , has drawn much attention both because its 
very low memory complexity and its underlying theoretical beauty. 

For many applications, it is necessary to drift away from the uniform models. A striking 
example lies in the most recent paradigm for the in silico analysis of the RNA molecule's folding. 
Instead of trying to predict a structure of minimal free-energy, current approaches tend to focus 
on the ensemble properties of achievable conformations, assuming a Boltzmann probability distri- 
bution [4]. Random generation is then performed, and complex structural features are evaluated 
in a statistical manner. In order to capture such features, a general non-uniform scheme was 
proposed by Denise et al [2], based on the concept of weighted context-free grammars. Recursive 
random generation algorithms were derived, with time and space complexities similar to that of 
the uniform ones [8]. 

In the weighted probability distribution, the probability ratio between the most and least fre- 
quent words sometimes grows exponentially on the size of the generated objects. Therefore it is a 
natural question to address the non-redundant random generation of combinatorial objects, 
that is the generation of a set of distinct objects. By contrast to the general case, this aspect 
of random generation has, to our best knowledge, only been addressed through the introduction 



of the PowerSet construct by Zimmermann [T^. An algorithm in 8(ri^) arithmetic operations, 
or a practical complexity in this case, was derived for recursive decomposable structures. 

The absence of redundancy in the set of generated structures was achieved respectively through 
rejection or an unranking algorithms. While the former is discussed later on in the document, 
the latter cannot be transposed directly to the case of weighted languages, since the assumption 
that different integral ranks correspond to different objects does not hold. Furthermore, the al- 
gorithm cannot immediately account for an initial set of forbidden words, short of computing an 
intersection grammar, a computationally intensive process that would further induce large time 
and memory constants. 

In this paper, we address the non-redundant generation of words from a context-free language, 
generated while avoiding a pre-defined inclusive set J^. First, we define some concepts and 
notations, which allows us to rephrase the random generation process as a step-by-step walk. 
Then, we investigate the efficiency of a rejection-based approach to our problem. We show that, 
although well-suited for the uniform distribution, the rejection approach can yield high average- 
case complexities for large sets of forbidden words. In the weighted case, we show that the 
complexity of the rejection approach can grow exponentially on the number of desired sequences. 
Finally, we introduce a new algorithm, based on a recursive approach, which generates k sequences 
of length n while avoiding a set at the cost of 0{kn\og{n)) arithmetic operations after a 
precomputation in Q{{n + arithmetic operations. 

2 Concepts and notations 

2.1 Context-free grammars 

We remind some formal definitions on context-free grammars and Chomsky Normal Form (CNF). 
A context-free grammar is a 4-tuple Q = {Y^,N ,S) where 

• S is the alphabet, i.e. a finite set of terminal symbols. 

• A/" is a finite set of non-terminal symbols. 

• V \s, the finite set of production rules, each of the form N ^ X ,ior N ^ M any non-terminal 
and X e {SU7V}*. 

• 5 is the axiom of the grammar, i. e. the initial non-terminal. 

A grammar Q is then said to be in Chomsky Normal Form (CNF) iff the rules associated to 
each non-terminal N € Af are either: 

• Product case: N N' . N" 

• Union case: N ^ N' \ N" 

• Terminal case: N ^ t 

for N,N' ,N" G M non-terminal symbols and t e E a terminal symbol. In addition, the axiom 
<S G A/" is allowed to derive the empty word e only if S does not appear in the right-hand side of 
any production. 

Let £.{N) be the language associated to A^ G A/", that is the set of words on terminal symbols, 
accessible through a sequence of derivations starting from A^. Then the language C{Q) generated 
by a grammar Q = (S, A/", 7^,5) is defined to be the language C{S) associated with its axiom S. 
It is a classic result that any context-free grammar Q can be transformed into a grammar Q' in 
Chomsky Normal Form (CNF) such that C{Q) = C,{Q')- Therefore, we will define our algorithm 
on CNF grammars, but will nevertheless illustrate its behavior on a compact, non-CNF, grammar 
for Motzkin words. Indeed the normalization process introduces numerous non-terminals even 
for the most trivial grammars. 




Figure 1: Trees of all walks associated with Motzkin words of size n S [1,6] generated by the 
grammar S-^aSbS\cS\e under a leftmost first derivation policy. 

2.2 Fixed-length languages descriptions: Immature words 

We call mature word a sequence of terminal symbols. More generally, we will call immature 
a word that contains both terminal and non-terminal symbols, thus potentially requiring some 
further processing before becoming a mature word. We will denote C'^{N) the set of immature 
words accessible from a non-terminal symbol N and extend this notion to £-^{Q) the immature 
words accessible from the axiom of a grammar Q. It is noteworthy that C{G) C C'^{G)- 

We can then attach required lengths to the symbols of an immature word. For instance, 
in the grammar for Motzkin words from figure ^ c a b Sq will be a specialization of the 
immature word c a S b where the words generated from the first (resp. second) instance of 
the non-terminal 5* are required to have length 4 (resp. 0). Formally, this amounts to taking into 
consideration couples of the form (w, n) where ui is an immature word, and n G NI"^' is a vector 
of sizes for words generated from the different symbols of w. We naturally extend the notion of 
language associated with an immature word to these couples in the following way: 

C{uj,n) ^ ]^£(wj,ni) 

i>i 

The length vector n associated with an immature word lo may be omitted in the rest of the 
document for the sake of simplicity. 

2.3 Random generation as a random walk in language space 

An atomic derivation, starting from a word a; = w' . iV . w" S {E U A/"}*, is the application of 
a production N ^ X to u! that replaces N by the right-hand side X of the production, which 
yields co ^ tu' .X.uj" . Let us call derivation policy a deterministic strategy that points, in an 
immature word, the first non-terminal to be rewritten through an atomic derivation. Formally, 
in the context of a grammar Q, a derivation policy is a function (j) ■ C{Q)LiC'^ {Q) — > N U {0} such 



that 

: Lue C{g) 

A sequence of atomic derivations is then said to be consistent with a given derivation 

policy if the non-terminal rewritten at each step is the one pointed by the pohcy. A side effect 
of this somewhat verbose notion is that it provides a convenient framework for defining the 
unambiguity of a grammar without any reference to parse trees. 

Definition 2.1. Let Q — (S,7V, V, S) be a context-free grammar and cj) a derivation pohcy acting 
on Q. The grammar Q is said unambiguous if and only, for each uj £ ^{Q), there exists only one 
sequence of atomic derivations that is consistent with </> and produces uj from S. 

The derivation leading to a mature word lo £ C{Q) in a grammar Q can then be associated 
in a one-to-one fashion with a walk in the space of languages associated with immature words, 
or parse walk, taking steps consistent with a given derivation policy (j). More precisely, such 
walks starts from the axiom S of the grammar. From a given immature word X G C'^{Q), the 
derivation policy points at a position k :— (l){X), where a non-terminal X}^ can be found. The 
parse walk can then be prolonged using one of the derivations acting on Xk (See Figures [1] and [5]). 



2.4 Weighted context-free grammars 

Definition 2.2 (Weighted Grammar [2]). A weighted grammar is a 5-tuple Q^^ = (tt, S, A/", V, S) 
where E, A/", V and S are the members of a context-free grammar, and tt : S — > M is a weighting 
function, that associates a real-valued weight ttj to each terminal symbol t. 

This notion of weight naturally extends to a mature word w in a multiplicative fashion, i.e. 
such that t:{w) = nl=i '"'lUi- Fi'om this, we can define a 7r-weighted probability distribution 
over a set of words £, such that the probability associated with a; G £ is 



(w I tt) = ^ 



Tr{w) 



tt{w') 



In the rest of the document, we may use 7t{uj) instead of 7r(£(w, n)) to denote the total weight of 
all words derivable from an immature word uj with associated lengths n. 



3 Efficiency of a rejection approacii for tiie non-redundant 
generation 

We address the uniform generation of a non-redundant set of words from a language C with 
forbidden words F <Z C A rejection approach for this problem consists in drawing words 
at random in an unconstrained way, rejecting those previously sampled until k distinct words 
are generated, as prescribed by Zimmermann jl7| in the case of recursive specifications. For the 
generation of words from context-free languages, we refer to previous works of Flajolet et al [3 [6], 
or Denise et al [3] that achieves a 0{ii}'^^) complexity through highly non-trivial floating-point 
arithmetics. 



3.1 The uniform case 

Theorem 3.1. Let C be a context-free language, n G a positive integer and J- C Cn a 
set of forbidden words. Then the rejection approach for the non-redundant uniform random 

generation of k words of size n from C has average-case complexity in O (( |£^^"[jr| ^ n^^^k log fc^ . 



Proof. In the uniform model when J- ~ 9, the number of attempts necessary to the generation of 
the i-th word only depends on i and is independent from prior events. Thus the random variable 
Xn^k that contains the total number of trials for the generation of k words of size n is such that 

fc-i , 

1=0 

where /„ := |£„| is the number of words of size n in the language and Jii the harmonic number of 
order i, as pointed out by Flajolet et al It follows that E(X„_fc) is trivially increasing with fc, 
while remaining upper bounded by kT-Lk G 0(fclog(fc)) when k = In (Coupon collector problem). 
Since the expected number of rejections due to a non-empty forbidden set T remains the same 
throughout the generation, and does not have any influence over the generated sequences, it can 
be considered independently and contributes to a factor | "|jr| . Finally, each generation takes 
time 0(71^+^), independently from both the generated sequence and the cardinality of £„. □ 

The complexity of a rejection approach to this problem is then mainly linear, unless the set 
J-^ overwhelms the language generated by the grammar. In this case, the generation can become 
linear in the cardinality of that is exponential in n for most languages. Furthermore, the 
worst-case time complexity of this approach remains unbounded. 



3.2 Weighted context-free languages 

By contrast with the uniform case, the rejection approach to the non-redundant random gen- 
eration for weighted context-free languages can yield an exponential complexity, even 
when starting from an empty set of forbidden words = 0. Indeed, the weighted distribution 
can specialize into a power-law distribution on the number of occurrences of the terminal symbol 
having highest weight, leading to an exponentially-growing number of rejections. 
Example: Consider the following grammar, generating the language a*b* of words starting with 
an arbitrary number of a, followed by any number of b: 

S a.S\T 
T b .T\e 

Wc adjoin a weight function tt to this grammar, such that 7r(6) a > 1 and 7r(a) := 1. The 
probability of the word cj™ := a"~'"&™ among Sn is then 



|cL;|=n 

Now consider the set Vn.k C 5„ of words having less than n — k occurrences of the symbol b. The 
probability of generating a word from Vn,k is then 

P(v„,..) = 5:pk_._.) = -,^r-r < 

i=0 

The expected number of generations before a sequence from V„^fe is generated is then lower- 
bounded by a'" . Since any non-redundant set of k sequences issued from Sn must contain at least 
one sequence from V„.fc, then the average-case time complexity of the rejection approach is in 
r2(na'^), that is exponential in k the number of words. 



One may argue that the previous example is not very typical of the rejection algorithm's 
behavior on a general context-free language, since the grammar is left linear and consequently 
defines a rational language. By contrast, it can be shown that, under a natural assumption, no 
word can asymptotically contribute up to a significant proportion of the distribution in simple 
type grammars. 





7z[abcS3) — 7^{abcacb)~7r{abcccc) 

7r{abaSi bSi ) 



^ 7T[abaSobS2) — '^{a'babab) 

TT {abaS2bSo) — {abaccb) 



P3 = ^ '- P4 

K ',=7T {abS j^) —7T [abcacb) — -K {abcccc) — TT {ababab) — TV {abaccb) 



Figure 2: Snapshot of a step-by-step random scenario for a Motzkin word of length 6, generated 
while avoiding J". From abS^, the recursive approach prescribes that the probabilities associated 
with candidate derivations must be proportional to the overall weights for resulting immature 
words. Additionally, we need to subtract the contributions of J- to these immature words, which 
yields probabilities (pi,p2,P3,P4)- 



Theorem 3.2. Let Q-„ — (tt, S, A/", 5, 7^) be a weighted grammar of simple typ^. Let wJJ be 
the word of length n generated from Q-^ with highest probability (i.e. weight) in the -k -weighted 
distribution. Additionally, let us assume that there exists a,K G M"*" positive constants such that 

Then the probability of lu^ tends to when n oo: 

Proof. From the Drmota-Lalley- Woods theorem [SJ [TTJ [TB] , we know that the generating func- 
tion of a simple type grammar has a square-root type singularity, and its coefficients admits an 
expansion of the form ^^^(1 + 0{l/n)). This property holds in the case of weighted context-free 
grammars, where the coefficients are now the overall weights Wn := 7r(£(C?^)„). Since is 
contributing to Wn, then 7r(a;°) < Tr{C{QTr)n) and therefore /3 > a. □ 

Lastly it can be shown that, for any fixed length n, the set of words A4 having maximal 
number of occurrences of a given symbol t have total probability 1 when Tr{t) — >■ oo. It follows 
that sampling more than \Ai\ words can be extremely time-consuming if one of the weights 
dominates the others by several orders of magnitude. 



4 Step-by-step approach to the non-redundant random gen- 
eration 

Instead of approaching the random generation from context-free languages by considering non- 
terminal symbols as independent generators [5J [2] , we consider the random generation sce- 

grammar of simple type is mainly a grammar whose dependency graph is strongly-connected and whose 
number of words follow an aperiodic progression (See [7] for a more complete definition). Such a grammar can 
easily be found for the avatars of the algebraic class of combinatorial structures (Dyck words, Motzkin paths, trees 
of fixed degree,...), all of which can be interpreted as trees. 



narios as random (parse) walks. This allows to determine to whieh of the local alternatives to 
subtract the contributions of forbidden (or already generated) words. 



4.1 The algorithm 

We propose the following Algorithm A, whieh from 

• a weighted grammar ~ (tt, 'E,J\f,T',S), 

• an immature word (0^,11), 

• and a set of forbidden words G £((cl>, n)), 

draws at random a word C{uj, n)/J^ with respect to a tt- weighted distribution: 

1. If w is a mature word 

then if n = (1, . . . , 1) and u ^ J- then return uj else Error. 

2. Let be the non-terminal pointed by cj) in lu, such as u; = ui' .N*^.Ld" and k'^ be the total 
weight of words from generated from uj. 

3. Choose a derivation uj uj'Xui" with the following probabilities, depending on the type of 
N*: 

• N* N' I N": Let kj^, be the sum of weights for words from generated from 
Lo'.N^.uj", then 

■k{L[uj)) — k^ 

• TV* N' . N": Let k^ be the sum of weights for words from F generated from 

• N* ^t: If m = 1 then F{X = t) = 1 else Error. 

4. Iterate from step [TJ 

Proposition 4.1. LetQj^ ~ {Tr,'E,,J\f,S,V) be a weighted grammar, uj £ {TiUj\f}* be an immature 
word and n G N'"' be a vector of sizes associated with positions of uj. 

Then Algorithm A draws a mature word at random according to the -k -weighted distribution from 
C{uj, n)\J- or throws an Error if C{uj,n) = 0. 

Proof. The previous claim can be proved very quickly by induction on the number k of required 
executions of line [1] before a mature word is issued: 

Base: The k = case corresponds to an already mature word uj, for which the associated 
language is limited to {uj}. As uj is generated by line [1] iff u) <^ T then the claimed result holds in 
that case. 

Inductive step: Assuming that the theorem holds for k > n, wc investigate the probabilities of 
emission for words that require /c = n + 1 derivations. Let be the non-terminal pointed by 
(j), then: 

• N* ^ N' \ N": Assume that the derivation N*^ ^ N'„^ is chosen w.p. ■ 
Then the induction hypothesis applies and a word x is generated from C(uj' .N^.uj")\J- in 



the TT- weighted distribution. In this distribution applied to C{lu' .N'^.ijj")\F , the probabihty 
of X is then 

ml I ' at' "^ 7r(x) 
P(x I w .7V,„.w ) 



tt{C{^' .K,.uj")\F) 

i:{x) 



■k{C{uj' .N'^.uj")) - T:{C{Lo'.N'„,.u")r\T) 
tt{x) 



The overaU probabihty of x starting from A^^^ is then 



-.{Ciuj)) - k- ^{C{u' .N^.Lo")) - kl, 

tt{x) 7r(x) 



n{Ciu})) - k'' tt{C{uj)\F) 
This property also holds if N'^ is chosen, after pointing out that 



{kl„ = fc- - kl,) ^ 1 - P(X = N'J = 



n{C{u'.N','^.uj"))-kl, 

Tr{C{uj)) - k'' 

• N* ^ N' . N": For any i e [l,m - 1], a partition => . N^_^ is chosen w.p. 
7r(£(a; •^^l^y^^'^'^ -^^ — Then the induction hypothesis applies and a word x is generated 
from £{uj' .N-.N^^_^uj")\J^ in the tt- weighted distribution. In this distribution applied to 
C{uj' .N[.N','^_^.uj")\T, the probability of x is then 

n^\u:'.N[.N:^_,.J') = ,,,,,, 

'i ■ 7n—i' 



niCiu;'.NX_^.u;")\T) 

t:{x) 



Tr{x) 



The overall probability of x starting from is then 

t:{C{u:' .NlN':^_^.u:")) ~ kf n{x) 



F{x) 



niCiuj)) - fc- 7riC{uj'.NiN'^_,.u;")) - kf 

Tr(x) 'k{x) 



tt{C{u)) - k^ tt{C{u)\T) 



• N* — !• t: The probability of any word x issued from lo is that of the word issued from 



uj'.t.uj", that is 1 — 7r(£(L)\j^) induction hypothesis. 



□ 



This algorithm then performs random generation of k distinct words from a (weighted) context- 
free language by setting the initial immature word to iS„ the axiom of C/^, adding the freshly 
generated sequence to at each step. 



5 Complexities and data structures 

The algorithm's complexity depends critically on efficient strategies and data structures for: 





I ;r(caccbc); 16 ^(abaccb): 4 r(abcccc|; 16 irlabcacb); 4 



Initial tree 



a) Generation 



b) Weights update 



Figure 3: Prefix tree built from tfie subset J- = {caccbc, abaccb, abcccc} of Motzkin words of 
size 6, using weights 7r(a) = 7r(6) = 1 and 7r(c) = 2. Generation (a) of a new word abcacb and 
update (b) of the contributions of T to the immature words involved in the generation. 



1. The weights of languages associated with immature words. 

2. The contributions fcjr of forbidden (or already generated) words associated with each im- 
mature word (cj, n) 



3. The investigation of the different partitions N* 

4. Big numbers arithmetics. 



N' . N" ■ 

2 m—t 



in the case of product rules. 



5.1 Weights of immature languages 

Proposition 5.1. Let (w,n) be an immature word and its associated length vector, whose weight 
Tr{u}, n) is known. 

Then a pre- computation involving &{n) arithmetic operations makes it possible to compute in 
0{1) arithmetic operations the weight of any word (a;',n') atomically derived from (a;,n) . 

Proof. In order to compute 7r(aj,n), we first compute the total weights of languages associated 



with each non-terminal N for all sizes up to n = 



as done by the traditional approach [2] . 



This can be done in Q{n) arithmetic operations, thanks to the holonomic nature of the generating 
functions at stake. Indeed, the coefficients of an holonomic function obey to a linear recurrence 
with polynomial coefficients, which can be algorithmically determined (Using the Maple package 
GFun [T^, for instance). 

We can in turn use these values to compute on the fly the weights of immature words of 
interest. Namely, while rewriting an immature word uj :~ a.N^./3 into w' := a.X./3 through a 
derivation X, the new weight 7r(a;') is given by 



7r(w') = nia.X.p) 



7r(a)7r(X)7r(^)7r(7V*) _ 7r(a;)7r(X) 



an:) 



where X contains at most two terms (CNF) and therefore a constant number of arithmetic 
operations is involved. □ 



5.2 A prefix tree for forbidden words 

Proposition 5.2. Let T be a set of words characterized by their parse walk^. Let T be a prefix 
tree built from the parse walks from T in Q{\T\.n) arithmetic operations. Then for any immature 

■^Starting from an unparsed raw set of forbidden words will require some additional parsing, which can be 
performed in 0(|J^|.n^) by a CYK-type algorithm. 



word u) reached during a random scenario, the contribution k^^ t:{C{ll:) nJ-) of forbidden words 
can be computed in 0(1) arithmetic operations. Furthermore, updating T through the addition of 
a generated word can be performed in Q{n) arithmetic operations. 

Proof. Thanks to the unambiguity of the grammar, a mature word v G £ belongs to the language 
generated from an immature one lo G £^ iff co is found on the parse walk of v. Therefore we 
gather the parse walks associated with forbidden words T into a prefix tree, additionally storing 
at each node the total weight of all words accessible from it. It is then possible to traverse the 
prefix tree during the generation, read the values of fc^ for candidate derivations on the children 
of the current node. 

The update of the prefix tree after a generation can be performed in Q{n) arithmetic oper- 
ations in two phases, as illustrated by Figure |31 First, a top-down stage adds nodes for each 
immature word traversed during the generation (a); Then, a bottom-up stage will propagate the 
weight of the sampled mature word to his ancestors (b) . □ 

Finally, it is remarkable that many of the internal nodes in the trees have degree 1, which 
means that their value for k^^ is just a copy of their child's own. Since in any tree the number of 
internal nodes of degree at least two is strictly smaller than the total number of leaves, then the 
number of different values for fc,r is bounded by 2 * \ J-\. It follows that the memory needed to 
store the prefix tree will scale like 0{n\T\), even if the encoding of each fc^ requires Q{n) bits of 
storage. 

5.3 Miscellaneous 

For point |3l we can use the so-called Boustrophedon strategy, which allows for an C'(nlog(n)) 
arithmetic operations generation in the worst case scenario. Since we only restrict the generation 
set to authorized (or not previously generated) words, such a property should hold in our case. 

For point |31 it is reasonable, for all practical purpose, to assume that the weights are going 
to be expressed as rational numbers. Multiplying these weights by the least common multiple of 
their denominators yields a new set of integral weights inducing the same probability distribution, 
thus arbitrary precision integers can be used. The numbers will scale like 0(a") for some explicit 
a, since the resulting language is context-free, and operations performed on such numbers will 
take time C'(nlog(n) loglog(n)) [14], while the space occupied by their encoding is in 0{n). 

5.4 Summary 

Let n G N+ be the overall length for generated words, k G the number of distinct generated 
words and J- the initial set of forbidden parse walks: 

• The time-complexity of Algorithm A is in Q{kn\og{n)) arithmetic operations in the 

worst case scenario, after a pre-processing in Q{\J^\n + n) arithmetic operations. 

• The memory complexity is in <d{n) numbers for the pre-processing, plus 0((| + k)n) 
bits for the storage of the prefix tree. 

• For rational- valued weights, using arbitrary arithmetics, the associated bit-complexities are 
in respectively Qikn^ log(ri)) for time and 0((| -I- fc -f n)n) for memory. 

• Lastly, starting from an empty forbidden set = yields a generation in Qikn^ log(n)) for 
time and 0(fcn + n^) for memory, complexities similar to that of the possibly redundant 
traditional approach [31 15] . 

6 Conclusion and perspectives 

We addressed the random generation of non-redundant sets of sequences from context-free lan- 
guages, while avoiding a predefined set of words. We first investigated the efficiency of a rejection 



approach. Such an approach was found to be acceptable in the uniform case. By contrast, for 
weighted languages, we showed that for some languages the expected number of rejections would 
grow exponentially on the desired number of generated sequences. Furthermore, we showed that 
in typical context-free languages and for fixed length, the probability distribution can be dom- 
inated by a small number of sequences. We proposed an alternative algorithm solution for this 
problem, based on the so-called recursive approach. The correctness of the algorithm was demon- 
strated, and its efficient implementation discussed. This algorithm was showed to achieve the 
generation of a non- redundant set of k structures with a time-complexity in 0{kn^ log(n)), while 
using 0(kn -\- n^) bits of storage. These complexities hold in the worst-case scenario, are almost 
unaffected by to the weights function used, and are equivalent to that of the traditional, possibly 
redundant, generation of k words using previous approaches. 

One natural extension of the current work concerns the random generation of the more general 
class of decomposable structures [8]. Indeed, such aspects like the pointing and unpointing oper- 
ator are not explicitly accounted for in the current work. Furthermore, the generation of labeled 
structures might be amenable to similar techniques in order to avoid a redundant generation. It 
is unclear however how to extend the notion of parse tree in this context. Isomorphism issues 
might arise, for instance while using the unranking operator. 

Following the remark that the random generation from reasonable specifications is a numer- 
ically stable problem [3], we could envision using arbitrary precision arithmetics to achieve a 
0(kn^^'^) complexity. Such a feature could accelerate an implementation of this algorithm, for 
instance in the software GenRGenS [I2j that already supports the formalism of weighted gram- 
mars. Another direction for an efficient implementation of this approach would be to investigate 
the use of Boltzmann samplers [6] . 

Moreover, the influence of the number of desired sequences, the length and the weights over the 
complexity of a rejection based approach deserves to be further characterized. Namely, are there 
simple-type grammars giving rise to an exponential complexity on k ? Can phase transition-like 
phenomena be observed for varying weights ? 
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